Voice and Speech Assessment From Telephone Recordings Using Prosodic Analysis Based on u-Law-Companded Features

نویسندگان

  • Tino Haderlein
  • Anne Schützenberger
  • Michael Döllinger
  • Elmar Nöth
چکیده

Objective assessment of voice and speech properties via telephone is desirable for rehabilitation purposes. 82 patients after partial laryngectomy read a standardized text on the phone. Five experienced raters assessed speech effort, match of breath and sense units, vocal tone, intelligibility, and overall voice quality perceptually based on these recordings. Objective evaluation was performed by the word accuracy and word correctness of a speech recognition system, and a set of prosodic features. The speech recognition system used μ-law features, i. e. modified MelFrequency Cepstrum Coefficients (MFCCs). The prosodic features were computed based on word hypotheses graphs produced by the speech recognizer. The human-machine correlation between these features and the perceptual evaluation show slightly better results for the system based on μ-law features than for the baseline MFCC system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The effect of bilateral subthalamic nucleus deep brain stimulation (STN-DBS) on the acoustic and prosodic features in patients with Parkinson’s disease: A study protocol for the first trial on Iranian patients

Background: The effect of subthalamic nucleus deep brain stimulation (STN-DBS) on the voice features in Parkinson’s disease (PD) is controversial. No study has evaluated the voice features of PD underwent STN-DBS by the acoustic, perceptual, and patient-based assessments comprehensively. Furthermore, there is no study to investigate prosodic features before and after DBS in PD. The curren...

متن کامل

Speech-based assessment of PTSD in a military population using diverse feature classes

There is a critical need for detection and monitoring of PostTraumatic Stress Disorder (PTSD) in both military and civilian populations. Current diagnosis is based on clinical interviews, but clinicians cannot keep up with the growing need. We examined the feasibility of using speech for assessment in a military population. We analyzed recordings of the Clinician-Administered PTSD Scale (CAPS) ...

متن کامل

Voice-based Age and Gender Recognition using Training Generative Sparse Model

Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...

متن کامل

Prosodic and Spectral iVectors for Expressive Speech Synthesis

This work presents a study on the suitability of prosodic and acoustic features, with a special focus on i-vectors, in expressive speech analysis and synthesis. For each utterance of two different databases, a laboratory recorded emotional acted speech, and an audiobook, several prosodic and acoustic features are extracted. Among them, i-vectors are built not only on the MFCC base, but also on ...

متن کامل

Speech Recognition with mu-Law Companded Features on Reverberated Signals

One of the goals of the EMBASSI project is the creation of a speech interface between a user and a TV set or VCR. The interface should allow spontaneous speech recorded by microphones far away from the speaker. This paper describes experiments evaluating the robustness of a speech recognizer against reverberation. For this purpose a speech corpus was recorded with several different distortion t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016